DumpWatch AI: An Automated Illegal Dumping Detection System Using YOLOv8 and Vision Language Model Verification

Authors: Mrs. Sankari , Manish R, Monish A

DOI Link: https://doi.org/10.22214/ijraset.2026.80955

Abstract

Illegal dumping of waste in unauthorized locations poses serious environmental, public health, and urban management challenges. Conventional CCTV-based monitoring relies on manual review, which is time-consuming, inconsistent, and impractical for large-scale deployment. There is a growing need for automated, intelligent systems that can detect dumping events in real time without human intervention. This paper presents DumpWatch AI, an automated surveillance system designed to detect illegal dumping incidents in CCTV footage. The system employs YOLOv8 [1], a state-of-the-art real-time object detection model, to track persons, vehicles, and potential waste objects across video frames. A temporal persistence logic identifies objects that remain stationary at a location for more than five seconds after the associated person or vehicle has departed, flagging them as potential dumping events. To reduce false positives and enrich incident data, flagged frames are submitted to Gemini 2.0 Flash [2], a Vision Language Model (VLM), which performs semantic reasoning to classify the event type (Household, Industrial, Furniture), assess severity (LOW, MEDIUM, HIGH), and generate a natural-language summary. Additionally, EasyOCR [3] extracts license plate numbers from vehicle crops for evidentiary logging. All incidents are stored in a structured SQLite database and an annotated MP4 output is produced for review. Experimental evaluations demonstrate a detection accuracy of 91% with a false positive rate below 9%, confirming the system\'s practical viability for smart city surveillance applications.

Introduction

DumpWatch AI is an automated surveillance system designed to detect and document illegal dumping of household, industrial, and construction waste using CCTV footage. Illegal dumping causes environmental damage, public health risks, groundwater contamination, and significant cleanup costs. Traditional monitoring methods rely on citizen reports and manual CCTV review, which are inefficient and reactive. DumpWatch AI addresses this problem through a combination of computer vision, artificial intelligence, object tracking, OCR, and automated evidence logging.

Background and Motivation

Recent advances in deep learning have enabled real-time video analysis. YOLOv8 provides fast and accurate object detection, while Gemini 2.0 Flash, a Vision Language Model (VLM), adds contextual understanding and reasoning capabilities. By integrating these technologies with EasyOCR and SQLite, DumpWatch AI offers a complete end-to-end solution for detecting, classifying, and recording illegal dumping incidents.

Literature Review

Previous research has explored:

CNN-based image classification for waste detection, though without precise object localization.
Faster R-CNN and other region-based detectors, which improved accuracy but were too slow for real-time use.
YOLO-based detectors, especially YOLOv8, which balance speed and accuracy for surveillance applications.
Object tracking techniques such as SORT and DeepSORT for monitoring scene changes.
Vision Language Models (VLMs) like Gemini and GPT-4 Vision for scene understanding and activity recognition.
OCR-based license plate recognition for traffic monitoring and law enforcement.

However, no prior system combined:

Real-time object detection,
Temporal tracking,
AI-based semantic verification,
License plate recognition, and
Automated incident logging

into a single deployable solution. DumpWatch AI fills this gap.

Methodology

The system operates through five sequential stages:

1. Video Ingestion

CCTV footage is processed using OpenCV.
Every third frame is analyzed to improve efficiency.
Frames are resized to 640 × 640 pixels.

2. Object Detection and Tracking

YOLOv8m detects:
- People
- Vehicles (cars, trucks, buses, motorcycles)
- Potential waste objects (bags, bottles, suitcases, chairs, etc.)
ByteTrack assigns unique IDs to tracked objects.

3. Temporal Persistence Analysis

An object is considered a potential dumping event when:

It remains stationary for more than 5 seconds.
The associated person or vehicle leaves the scene.
The object's position changes by less than 15 pixels.

This prevents false alarms caused by temporarily placed objects.

4. AI Verification

When a potential dumping event is detected:

A cropped image is sent to Gemini 2.0 Flash.
The model classifies:
- Event type (household waste, industrial waste, furniture, construction debris, etc.)
- Severity (Low, Medium, High)
- Incident summary

This step filters ambiguous detections and enriches incident records.

5. License Plate Extraction and Logging

EasyOCR extracts vehicle license plates.
Incident details are stored in an SQLite database.
An annotated video is generated showing alerts and detected objects.

System Architecture

DumpWatch AI consists of five modules:

Video Capture Module
- CCTV input and preprocessing.
Detection & Tracking Module
- YOLOv8 + ByteTrack.
Temporal Analysis Module
- Stationary-object and departure detection logic.
AI Verification Module
- Gemini-based semantic validation.
Evidence Logging Module
- SQLite database storage and video annotation.

Data Flow

CCTV Input → Frame Sampling → YOLOv8 Detection → Temporal Analysis → Gemini Verification → OCR → Database & Video Output

Implementation

Hardware

Google Colab T4 GPU
CCTV/IP Camera
Internet connection
Local storage

Software

Python 3.10
YOLOv8
OpenCV
Gemini 2.0 Flash API
EasyOCR
SQLite

The system is designed for cloud deployment without requiring dedicated local GPU hardware.

Results and Evaluation

The system was tested on 120 surveillance videos, including:

85 actual dumping incidents
35 non-dumping scenarios

Performance Metrics

Metric	Result
Detection Accuracy	91.2%
Precision	89.7%
Recall	92.4%
F1 Score	91.0%
False Positive Rate	8.6%
License Plate Recognition	84.3%

Processing Speed

Stage	Average Latency
YOLOv8 Detection	18 ms/frame
Gemini Verification	1.4 s
EasyOCR	210 ms
Database Logging	< 5 ms

Event Classification Accuracy

Household Waste: 94.7%
Furniture Dumping: 90.5%
Industrial Debris: 85.7%
Litter/Bottles: 91.7%

Key Findings

The 5-second persistence rule reduced false positives by approximately 34%.
Gemini verification significantly improved detection reliability, especially in low-light conditions.
OCR performance was limited mainly by low-resolution CCTV footage and unfavorable camera angles.

Outputs

The system generates two primary outputs:

1. Annotated Video

Color-coded bounding boxes:
- Green: Persons
- Blue: Vehicles
- Red: Confirmed dumping incidents
Real-time status information.
Red alert overlay for confirmed violations.

2. Incident Database

Each record contains:

Incident ID
Timestamp
Event Type
Severity Level
AI-generated Summary
License Plate Number (if available)

Conclusion

This paper presented DumpWatch AI, an end-to-end automated illegal dumping detection system that integrates YOLOv8 object tracking, temporal persistence analysis, Gemini 2.0 Flash semantic verification, EasyOCR license plate recognition, and SQLite-based evidence logging. The system achieves a detection accuracy of 91.2% and an F1-score of 91.0% on a representative test dataset, demonstrating practical viability for real-world urban surveillance deployment. The two-stage detection architecture, combining fast visual detection with reasoning-layer verification, represents a scalable approach to intelligent video analytics that can be adapted to related surveillance tasks beyond illegal dumping. Future enhancements planned for DumpWatch AI include: • Cloud integration with AWS S3 or Google Cloud Storage for centralised incident archiving and remote dashboard access • Predictive analytics to identify high-risk dumping locations and time windows based on historical incident patterns • Edge deployment on NVIDIA Jetson Orin for on-premise, low-latency processing without cloud dependency • Multi-camera support with spatial incident correlation across overlapping fields of view • Mobile application for field officers enabling real-time incident viewing and enforcement workflow integration • Fine-tuning YOLOv8 on a domain-specific illegal dumping dataset to improve detection of context-specific waste categories

References

[1] G. Jocher, A. Chaurasia, and J. Qiu, \"Ultralytics YOLOv8,\" GitHub repository, Ultralytics, 2023. [Online]. Available: https://github.com/ultralytics/ultralytics [2] Google DeepMind, \"Gemini 2.0 Flash: A Multimodal Vision Language Model,\" Google AI Technical Report, 2024. [Online]. Available: https://deepmind.google/technologies/gemini [3] J. Baek, \"EasyOCR: Ready-to-Use OCR with 80+ Supported Languages,\" GitHub repository, 2020. [Online]. Available: https://github.com/JaidedAI/EasyOCR [4] K. He, X. Zhang, S. Ren, and J. Sun, \"Deep Residual Learning for Image Recognition,\" in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2016. https://doi.org/10.1109/CVPR.2016.90 [5] S. Ren, K. He, R. Girshick, and J. Sun, \"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,\" IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2017. https://doi.org/10.1109/TPAMI.2016.2577031 [6] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, \"You Only Look Once: Unified, Real-Time Object Detection,\" in Proc. IEEE CVPR, pp. 779–788, 2016. https://arxiv.org/abs/1506.02640 [7] Z. Zivkovic, \"Improved Adaptive Gaussian Mixture Model for Background Subtraction,\" in Proc. IEEE International Conference on Pattern Recognition, 2004. https://doi.org/10.1109/ICPR.2004.1333992 [8] G. Farneback, \"Two-Frame Motion Estimation Based on Polynomial Expansion,\" in Proc. Scandinavian Conference on Image Analysis, pp. 363–370, 2003. [9] N. Wojke, A. Bewley, and D. Paulus, \"Simple Online and Realtime Tracking with a Deep Association Metric,\" in Proc. IEEE International Conference on Image Processing (ICIP), pp. 3645–3649, 2017. https://arxiv.org/abs/1703.07402 [10] OpenAI, \"GPT-4 Technical Report,\" arXiv preprint arXiv:2303.08774, 2023. https://arxiv.org/abs/2303.08774 [11] H. Li, P. Wang, and C. Shen, \"Towards End-to-End Car License Plate Detection and Recognition with Deep Neural Networks,\" IEEE Transactions on Intelligent Transportation Systems, vol. 20, no. 3, pp. 1126–1136, 2019. https://doi.org/10.1109/TITS.2018.2847291 [12] OpenCV, \"Open Source Computer Vision Library,\" 2024. [Online]. Available: https://opencv.org [13] N. Aloysius and M. Geetha, \"A Review on Deep Convolutional Neural Networks,\" in Proc. IEEE International Conference on Communication and Signal Processing, pp. 0588–0592, 2017. [14] W. Luo, J. Xing, A. Milan, X. Zhang, W. Liu, and T. K. Kim, \"Multiple Object Tracking: A Literature Review,\" Artificial Intelligence, vol. 293, 2021. https://doi.org/10.1016/j.artint.2020.103448 [15] A. Bochkovskiy, C. Y. Wang, and H. Y. M. Liao, \"YOLOv4: Optimal Speed and Accuracy of Object Detection,\" arXiv preprint arXiv:2004.10934, 2020. https://arxiv.org/abs/2004.10934 [16] V. Lepetit, F. Moreno-Noguer, and P. Fua, \"EPnP: An Accurate O(n) Solution to the PnP Problem,\" International Journal of Computer Vision, vol. 81, pp. 155–166, 2009. [17] T. Lin, M. Maire, S. Belongie et al., \"Microsoft COCO: Common Objects in Context,\" in Proc. European Conference on Computer Vision (ECCV), pp. 740–755, 2014. https://arxiv.org/abs/1405.0312 [18] Government of India, \"Solid Waste Management Rules,\" Ministry of Environment, Forest and Climate Change, 2016. https://www.moef.gov.in [19] S. Agarwal, A. Tarai, and P. Bhatt, \"Smart Waste Management System Using IoT and Machine Learning,\" International Journal of Intelligent Systems and Applications, vol. 14, no. 3, pp. 45–57, 2022. [20] R. Girshick, J. Donahue, T. Darrell, and J. Malik, \"Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation,\" in Proc. IEEE CVPR, pp. 580–587, 2014. https://arxiv.org/abs/1311.2524

Copyright

Copyright © 2026 Mrs. Sankari , Manish R, Monish A. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET80955

Publish Date : 2026-04-24

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here